Summary Grids: Building Accurate Multidimensional Histograms
نویسندگان
چکیده
Data summarization is very important for many data analysis tasks. In this paper we propose a simple but efficient data summarization algorithm, which outputs a histogram for multidimensional data, and make a comparative study of its usage with different distributions and with existing algorithms. The idea is to iteratively grow and modify regions of homogeneous data. This is a different strategy from a commonly used strategy of iteratively fracturing subspaces using straight lines. This work compares both strategies and concludes that the new technique is better and helds good results. We also concluded that discriminate handling of outliers is important to provide good approximates.
منابع مشابه
SASH: A Self-Adaptive Histogram Set for Dynamically Changing Workloads
Most RDBMSs maintain a set of histograms for estimating the selectivities of given queries. These selectivities are typically used for costbased query optimization. While the problem of building an accurate histogram for a given attribute or attribute set has been well-studied, little attention has been given to the problem of building and tuning a set of histograms collectively for multidimens...
متن کاملCost Estimation Techniques for Database Systems
This dissertation is about developing advanced selectivity and cost estimation techniques for query optimization in database systems. It addresses the following three issues related to current trends in database research: estimating the cost of spatial selections, building histograms without looking at data, and estimating the selectivity of XML path expressions. The first part of this disserta...
متن کاملBuilding Wavelet Histograms on Large Data in MapReduce
MapReduce is becoming the de facto framework for storing and processing massive data, due to its excellent scalability, reliability, and elasticity. In many MapReduce applications, obtaining a compact accurate summary of data is essential. Among various data summarization tools, histograms have proven to be particularly important and useful for summarizing data, and the wavelet histogram is one...
متن کاملDynamic Maintenance of Wavelet-Based Histograms
In this paper, we introduce an e cient method for the dynamic maintenance of wavelet-based histograms (and other transform-based histograms). Previous work has shown that wavelet-based histograms provide more accurate selectivity estimation than traditional histograms, such as equi-depth histograms. But since wavelet-based histograms are built by a nontrivial mathematical procedure, namely, wav...
متن کاملHigh-order central ENO finite-volume scheme for hyperbolic conservation laws on three-dimensional cubed-sphere grids
A fourth-order accurate finite-volume scheme for hyperbolic conservation laws on three-dimensional (3D) cubedsphere grids is described. The approach is based on a central essentially non-oscillatory (CENO) finite-volume method that was recently introduced for two-dimensional compressible flows and is extended to 3D geometries with structured hexahedral grids. Cubed-sphere grids feature hexahedr...
متن کامل